NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Hamiltonicity of sparse pseudorandom graphs

https://doi.org/10.1017/S0963548325000070

Ferber, Asaf; Han, Jie; Mao, Dingjia; Vershynin, Roman (July 2025, Combinatorics, Probability and Computing)

Free, publicly-accessible full text available July 1, 2026
Covariance loss, Szemeredi regularity, and differential privacy

https://doi.org/10.1090/proc/17126

Boedihardjo, March; Strohmer, Thomas; Vershynin, Roman (February 2025, Proceedings of the American Mathematical Society)

We show how randomized rounding based on Grothendieck’s identity can be used to prove a nearly tight bound on the covariance loss–the amount of covariance that is lost by taking conditional expectation. This result yields a new type of weak Szemeredi regularity lemma for positive semidefinite matrices and kernels. Moreover, it can be used to construct differentially private synthetic data.
more » « less
Free, publicly-accessible full text available February 1, 2026
Differentially private low-dimensional synthetic data from high-dimensional datasets

https://doi.org/10.1093/imaiai/iaae034

He, Yiyun; Strohmer, Thomas; Vershynin, Roman; Zhu, Yizhe (January 2025, Information and Inference: A Journal of the IMA)

Differentially private synthetic data provide a powerful mechanism to enable data analysis while protecting sensitive information about individuals. However, when the data lie in a high-dimensional space, the accuracy of the synthetic data suffers from the curse of dimensionality. In this paper, we propose a differentially private algorithm to generate low-dimensional synthetic data efficiently from a high-dimensional dataset with a utility guarantee with respect to the Wasserstein distance. A key step of our algorithm is a private principal component analysis (PCA) procedure with a near-optimal accuracy bound that circumvents the curse of dimensionality. Unlike the standard perturbation analysis, our analysis of private PCA works without assuming the spectral gap for the covariance matrix.
more » « less
Full Text Available
Private measures, random walks, and synthetic data

Boedihardjo, March; Strohmer, Thomas; Vershynin, Roman (April 2024, Probability theory and related fields)

Full Text Available
Private measures, random walks, and synthetic data

https://doi.org/10.1007/s00440-024-01279-z

Boedihardjo, March; Strohmer, Thomas; Vershynin, Roman (April 2024, Probability Theory and Related Fields)

Abstract Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex—but very common—machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates aprivate measurefrom a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data in general compact metric spaces, for any fixed privacy budget$$\varepsilon $$ $ε$ bounded away from zero. A key ingredient in our construction is a newsuperregular random walk, whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmically slowly.
more » « less
Covariance's Loss is Privacy's Gain: Computationally Efficient, Private and Accurate Synthetic Data

Boedihardjo, March; Strohmer, Thomas; Vershynin, Roman (February 2024, Foundations of Computational Mathematics)

Full Text Available
The quarks of attention: Structure and capacity of neural attention building blocks

https://doi.org/10.1016/j.artint.2023.103901

Baldi, Pierre; Vershynin, Roman (June 2023, Artificial Intelligence)

Full Text Available
Algorithmically Effective Differentially Private Synthetic Data

He, Yiyun; Vershynin, Roman; Zhu, Yizhe (July 2023, 36th Annual Conference on Learning Theory)

We present a highly effective algorithmic approach for generating ε-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance.
more » « less
Full Text Available
AVIDA: An alternating method for visualizing and integrating data

https://doi.org/10.1016/j.jocs.2023.101998

Dover, Kathryn; Cang, Zixuan; Ma, Anna; Nie, Qing; Vershynin, Roman (April 2023, Journal of Computational Science)

Full Text Available
AVIDA: Alternating method for Visualizing and Integrating Data

Dover, Kathryn; Cang, Zixuan; Ma, Anna; Nie, Qing; Vershynin, Roman (April 2023, Journal of computational science)

High-dimensional multimodal data arises in many scientific fields. The integration of multimodal data becomes challenging when there is no known correspondence between the samples and the features of different datasets. To tackle this challenge, we introduce AVIDA, a framework for simultaneously performing data alignment and dimension reduction. In the numerical experiments, Gromov-Wasserstein optimal transport and t-distributed stochastic neighbor embedding are used as the alignment and dimension reduction modules respectively. We show that AVIDA correctly aligns high-dimensional datasets without common features with four synthesized datasets and two real multimodal single-cell datasets. Compared to several existing methods, we demonstrate that AVIDA better preserves structures of individual datasets, especially distinct local structures in the joint low-dimensional visualization, while achieving comparable alignment performance. Such a property is important in multimodal single-cell data analysis as some biological processes are uniquely captured by one of the datasets. In general applications, other methods can be used for the alignment and dimension reduction modules.
more » « less
Full Text Available

« Prev Next »

Search for: All records